Stochastic Gradient Descent with Only One Projection
نویسندگان
چکیده
Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1/ √ T ) convergence rate for general convex optimization, and an O(lnT/T ) rate for strongly convex optimization under mild conditions about the domain and the objective function.
منابع مشابه
Efficient Stochastic Gradient Descent for Strongly Convex Optimization
We motivate this study from a recent work on a stochastic gradient descent (SGD) method with only one projection (Mahdavi et al., 2012), which aims at alleviating the computational bottleneck of the standard SGD method in performing the projection at each iteration, and enjoys an O(log T/T ) convergence rate for strongly convex optimization. In this paper, we make further contributions along th...
متن کاملConditional Accelerated Lazy Stochastic Gradient Descent
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).
متن کاملOne Network to Solve Them All — Solving Linear Inverse Problems using Deep Projection Models
We now describe the architecture of the networks used in the paper. We use exponential linear unit (elu) [1] as activation function. We also use virtual batch normalization [6], where the reference batch size bref is equal to the batch size used for stochastic gradient descent. We weight the reference batch with bref bref+1 . We define some shorthands for the basic components used in the networks.
متن کاملOptimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections
We consider stochastic strongly convex optimization with a complex inequality constraint. This complex inequality constraint may lead to computationally expensive projections in algorithmic iterations of the stochastic gradient descent (SGD) methods. To reduce the computation costs pertaining to the projections, we propose an Epoch-Projection Stochastic Gradient Descent (Epro-SGD) method. The p...
متن کاملRandom Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints
Consider convex optimization problems subject to a large number of constraints. We focus on stochastic problems in which the objective takes the form of expected values and the feasible set is the intersection of a large number of convex sets. We propose a class of algorithms that perform both stochastic gradient descent and random feasibility updates simultaneously. At every iteration, the alg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012